plant specimen
AGP: A Novel Arabidopsis thaliana Genomics-Phenomics Dataset and its HyperGraph Baseline Benchmarking
Serna-Aguilera, Manuel, Goggin, Fiona L., Goswami, Aranyak, Bucksch, Alexander, Liu, Suxing, Luu, Khoa
Understanding which genes control which traits in an organism remains one of the central challenges in biology. Despite significant advances in data collection technology, our ability to map genes to traits is still limited. This genome-to-phenome (G2P) challenge spans several problem domains, including plant breeding, and requires models capable of reasoning over high-dimensional, heterogeneous, and biologically structured data. Currently, however, many datasets solely capture genetic information or solely capture phenotype information. Additionally, phenotype data is very heterogeneous, which many datasets do not fully capture. The critical drawback is that these datasets are not integrated, that is, they do not link with each other to describe the same biological specimens. This limits machine learning models' ability to be informed on the various aspects of these specimens, impacting the breadth of correlations learned, and therefore their ability to make more accurate predictions. To address this gap, we present the Arabidopsis Genomics-Phenomics (AGP) Dataset, a curated multi-modal dataset linking gene expression profiles with phenotypic trait measurements in Arabidopsis thaliana, a model organism in plant biology. AGP supports tasks such as phenotype prediction and interpretable graph learning. In addition, we benchmark conventional regression and explanatory baselines, including a biologically-informed hypergraph baseline, to validate gene-trait associations. To the best of our knowledge, this is the first dataset that provides multi-modal gene information and heterogeneous trait or phenotype data for the same Arabidopsis thaliana specimens. With AGP, we aim to foster the research community towards accurately understanding the connection between genotypes and phenotypes using gene information, higher-order gene pairings, and trait data from several sources.
- North America > United States > Arizona (0.04)
- North America > United States > Arkansas > Washington County > Fayetteville (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (1.00)
- Overview (0.93)
Studying plant-climate relationships using machine learning
Scientists from UNSW and Botanic Gardens of Sydney have trained AI to unlock data from millions of plant specimens kept in herbaria around the world, to study and combat the impacts of climate change on flora. "Herbarium collections are amazing time capsules of plant specimens," says lead author on the study, Associate Professor Will Cornwell. "Each year over 8000 specimens are added to the National Herbarium of New South Wales alone, so it's not possible to go through things manually anymore." Using a new machine learning algorithm to process over 3000 leaf samples, the team discovered that contrary to frequently observed interspecies patterns, leaf size doesn't increase in warmer climates within a single species. Published in the American Journal of Botany, this research not only reveals that factors other than climate have a strong effect on leaf size within a plant species, but demonstrates how AI can be used to transform static specimen collections and to quickly and effectively document climate change effects.